The NYC restaurant inspections dataset has 397584 observations and 18
variables. Some observations have no dba, which have been
filtered out from later analysis. Some observations have a date of
1900-01-01, which is obviously in error. These were also filtered out
from subsequent analysis showing critical violations by stores over the
years.
data("rest_inspec")
summary(rest_inspec)
## action boro building camis
## Length:397584 Length:397584 Length:397584 Min. :30075445
## Class :character Class :character Class :character 1st Qu.:41227319
## Mode :character Mode :character Mode :character Median :41622444
## Mean :44534756
## 3rd Qu.:50011150
## Max. :50071063
##
## critical_flag cuisine_description dba
## Length:397584 Length:397584 Length:397584
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## inspection_date inspection_type phone
## Min. :1900-01-01 00:00:00.00 Length:397584 Length:397584
## 1st Qu.:2015-03-17 00:00:00.00 Class :character Class :character
## Median :2016-02-03 00:00:00.00 Mode :character Mode :character
## Mean :2015-09-27 22:03:03.41
## 3rd Qu.:2016-12-13 00:00:00.00
## Max. :2017-10-17 00:00:00.00
##
## record_date score street
## Min. :2017-10-19 06:00:49.00 Min. : -2.00 Length:397584
## 1st Qu.:2017-10-19 06:00:49.00 1st Qu.: 11.00 Class :character
## Median :2017-10-19 06:00:49.00 Median : 15.00 Mode :character
## Mean :2017-10-19 06:00:49.06 Mean : 18.93
## 3rd Qu.:2017-10-19 06:00:49.00 3rd Qu.: 24.00
## Max. :2017-10-19 06:00:59.00 Max. :151.00
## NA's :22642
## violation_code violation_description zipcode grade
## Length:397584 Length:397584 Min. :10001 Length:397584
## Class :character Class :character 1st Qu.:10022 Class :character
## Mode :character Mode :character Median :10468 Mode :character
## Mean :10675
## 3rd Qu.:11229
## Max. :11697
## NA's :5
## grade_date
## Min. :2012-05-01 00:00:00.00
## 1st Qu.:2015-03-30 00:00:00.00
## Median :2016-02-17 00:00:00.00
## Mean :2016-01-31 05:45:17.54
## 3rd Qu.:2016-12-13 00:00:00.00
## Max. :2017-10-17 00:00:00.00
## NA's :204287
rest_inspec |>
filter(grade == "C" & critical_flag == "Critical")
## # A tibble: 4,623 × 18
## action boro building camis critical_flag cuisine_description dba
## <chr> <chr> <chr> <int> <chr> <chr> <chr>
## 1 Violations wer… MANH… 365 4.14e7 Critical Asian ALPH…
## 2 Violations wer… MANH… 370 5.00e7 Critical Asian HOA …
## 3 Violations wer… MANH… 11 4.17e7 Critical Korean MARU
## 4 Violations wer… MANH… 537 5.00e7 Critical Café/Coffee/Tea CORS…
## 5 Violations wer… MANH… 35 4.13e7 Critical Korean MADA…
## 6 Violations wer… MANH… 150 5.00e7 Critical American MADI…
## 7 Violations wer… MANH… 312 4.14e7 Critical American CAFE…
## 8 Violations wer… MANH… 249 5.01e7 Critical American PARS…
## 9 Violations wer… MANH… 229 5.00e7 Critical Chinese GRAN…
## 10 Violations wer… MANH… 0 4.06e7 Critical American KABO…
## # ℹ 4,613 more rows
## # ℹ 11 more variables: inspection_date <dttm>, inspection_type <chr>,
## # phone <chr>, record_date <dttm>, score <int>, street <chr>,
## # violation_code <chr>, violation_description <chr>, zipcode <int>,
## # grade <chr>, grade_date <dttm>
rest_inspec |>
filter(!(boro == "Missing")) |>
mutate(
boro = factor(boro),
boro = fct_relevel(boro, c("STATEN ISLAND", "BRONX", "QUEENS", "MANHATTAN", "BROOKLYN"))
) |>
plot_ly(x = ~boro, y = ~score, color = ~boro, type = "violin", colors = "viridis")
## Warning: Ignoring 22636 observations
This code chunk takes the total number of critical reports by
business and orders them in descending order to see the most frequently
reported businesses. Dunkin’ Donuts and Dunkin’ Donuts/Baskin Robbins
are their own separate entry, so these are combined to Dunkin’ Donuts.
head is used to extract the top 20 worst offenders.
rest_inspec |>
filter(critical_flag == "Critical") |>
mutate(
dba = case_match(
dba,
"DUNKIN' DONUTS, BASKIN ROBBINS" ~ "DUNKIN' DONUTS",
.default = dba
)
) |>
summary()
## action boro building camis
## Length:218913 Length:218913 Length:218913 Min. :30075445
## Class :character Class :character Class :character 1st Qu.:41232846
## Mode :character Mode :character Mode :character Median :41624158
## Mean :44549022
## 3rd Qu.:50011159
## Max. :50070808
##
## critical_flag cuisine_description dba
## Length:218913 Length:218913 Length:218913
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## inspection_date inspection_type phone
## Min. :2012-05-01 00:00:00.00 Length:218913 Length:218913
## 1st Qu.:2015-03-25 00:00:00.00 Class :character Class :character
## Median :2016-02-10 00:00:00.00 Mode :character Mode :character
## Mean :2016-01-29 17:45:27.47
## 3rd Qu.:2016-12-20 00:00:00.00
## Max. :2017-10-17 00:00:00.00
##
## record_date score street
## Min. :2017-10-19 06:00:49 Min. : -2.00 Length:218913
## 1st Qu.:2017-10-19 06:00:49 1st Qu.: 12.00 Class :character
## Median :2017-10-19 06:00:49 Median : 17.00 Mode :character
## Mean :2017-10-19 06:00:49 Mean : 20.76
## 3rd Qu.:2017-10-19 06:00:49 3rd Qu.: 26.00
## Max. :2017-10-19 06:00:49 Max. :151.00
##
## violation_code violation_description zipcode grade
## Length:218913 Length:218913 Min. :10001 Length:218913
## Class :character Class :character 1st Qu.:10022 Class :character
## Mode :character Mode :character Median :10467 Mode :character
## Mean :10672
## 3rd Qu.:11229
## Max. :11697
##
## grade_date
## Min. :2012-05-01 00:00:00.00
## 1st Qu.:2015-03-31 00:00:00.00
## Median :2016-02-17 00:00:00.00
## Mean :2016-01-30 12:45:31.42
## 3rd Qu.:2016-12-09 00:00:00.00
## Max. :2017-10-17 00:00:00.00
## NA's :118118
rest_inspec |>
filter(critical_flag == "Critical") |>
mutate(
dba = case_match(
dba,
"DUNKIN' DONUTS, BASKIN ROBBINS" ~ "DUNKIN' DONUTS",
.default = dba
)
) |>
group_by(dba) |>
summarize(critical_reports = n()) |>
arrange(desc(critical_reports)) |>
head(n = 20) |>
mutate(
dba = factor(dba),
dba = fct_reorder(dba, critical_reports)
) |>
plot_ly(x = ~dba, y = ~critical_reports, color = ~dba, type = "bar", colors = "viridis")
rest_inspec |>
filter(critical_flag == "Critical") |>
mutate(
dba = case_match(
dba,
"DUNKIN' DONUTS, BASKIN ROBBINS" ~ "DUNKIN' DONUTS",
.default = dba
),
year = year(inspection_date)
) |>
group_by(dba, year) |>
summarize(critical_reports = n()) |>
arrange(desc(critical_reports), year) |>
head(n = 1000) |>
mutate(
dba = factor(dba),
dba = fct_reorder(dba, critical_reports)
) |>
plot_ly(x = ~dba, y = ~critical_reports, color = ~dba, type = "bar", colors = "viridis")
## `summarise()` has grouped output by 'dba'. You can override using the `.groups`
## argument.
This code chunk gets establishments that were closed (or re-closed) among those with critical reports
rest_inspec |>
filter(critical_flag == "Critical" & str_detect(action, "[Cc]losed")) |>
summary()
## action boro building camis
## Length:7013 Length:7013 Length:7013 Min. :40364179
## Class :character Class :character Class :character 1st Qu.:41353598
## Mode :character Mode :character Mode :character Median :41719174
## Mean :45622097
## 3rd Qu.:50035038
## Max. :50070321
##
## critical_flag cuisine_description dba
## Length:7013 Length:7013 Length:7013
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## inspection_date inspection_type phone
## Min. :2013-03-08 00:00:00.00 Length:7013 Length:7013
## 1st Qu.:2015-05-28 00:00:00.00 Class :character Class :character
## Median :2016-07-25 00:00:00.00 Mode :character Mode :character
## Mean :2016-05-01 03:51:49.25
## 3rd Qu.:2017-06-05 00:00:00.00
## Max. :2017-10-17 00:00:00.00
##
## record_date score street
## Min. :2017-10-19 06:00:49 Min. : 0.00 Length:7013
## 1st Qu.:2017-10-19 06:00:49 1st Qu.: 41.00 Class :character
## Median :2017-10-19 06:00:49 Median : 51.00 Mode :character
## Mean :2017-10-19 06:00:49 Mean : 53.57
## 3rd Qu.:2017-10-19 06:00:49 3rd Qu.: 65.00
## Max. :2017-10-19 06:00:49 Max. :151.00
##
## violation_code violation_description zipcode grade
## Length:7013 Length:7013 Min. :10001 Length:7013
## Class :character Class :character 1st Qu.:10027 Class :character
## Mode :character Mode :character Median :11106 Mode :character
## Mean :10737
## 3rd Qu.:11232
## Max. :11694
##
## grade_date
## Min. :NA
## 1st Qu.:NA
## Median :NA
## Mean :NaN
## 3rd Qu.:NA
## Max. :NA
## NA's :7013
rest_inspec |>
filter(critical_flag == "Critical" & str_detect(action, "[Cc]losed")) |>
group_by(cuisine_description) |>
summarize(closed = n()) |>
arrange(desc(closed)) |>
mutate(
cuisine_description = factor(cuisine_description),
cuisine_description = fct_reorder(cuisine_description, closed)
) |>
plot_ly(
)
## Warning: No trace type specified and no positional attributes specified
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
rest_inspec |>
filter(critical_flag == "Critical" & str_detect(action, "[Cc]losed")) |>
plot_ly()
## Warning: No trace type specified and no positional attributes specified
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode